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computable,  estimator  coefficients  is  studied.  Several  examples  and  special 
cases  are  studied  including  additive  independent  noise,  nonlinear  distortion 
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I.  INTRODUCTION 


The  problem  of  estimating  a  weighted  integral  of  a  random  process  from 
observations  of  the  process  at  a  finite  number  of  sampling  points  has  been 
studied  by  several  authors  (see  the  survey  [2]).  It  is  an  important  problem 
of  interest  in  several  areas  of  conmunications,  information  theory,  statistics, 
and  signal  processing.  The  usual  questions  of  interest  are  to  find  the  optimal 
sampling  design  of  size  n,  or  sampling  designs  which  are  asymptotically  optimal 
as  the  sample  size  tends  to  infinity.  Coupled  with  these  is  the  problem  of 
estimator  design  and  the  study  of  how  the  mean  square  estimation  error  tends 
to  zero  as  the  sample  size  tend  to  infinity. 

In  this  paper  we  consider  these  problems  for  the  case  where  the  observa¬ 
tions  are  corrupted  by  noise.  We  allow  the  noise  to  be  possibly  dependent 
upon  the  random  process  whose  integral  we  are  trying  to  estimate,  henceforth 
called  the  signal  process.  In  this  case  as  the  number  of  sampling  points 
increases  to  infinity  the  mean  square  approximation  error  no  lonqer  tends  to 
zero  but  instead  to  some  positive  least  possible  value.  We  consider  estimators 
which  use  optimal  coefficients  as  well  as  suboptimal  (but  simple)  coefficients. 

As  far  as  the  authors  are  aware,  the  only  case  of  noisy  observations 
considered  in  the  literature  is  in  [5,6],  where  the  observation  noise  is 
assumed  white  and  the  signal  Gauss-Markov .  The  optimal  sampling  designs  are 
determined  in  [5]  and  the  rates  of  convergence  of  the  mean  square  estimation 
error  are  found  in  [6]  to  be  1/n  with  noise  and  1/n  with  no  noise. 

One  of  the  main  contributions  of  this  paper  is  to  show  that  these  mean 
square  estimation  errors  and  their  rate  of  convergence  to  least  possible 
values  depend  crucially  on  the  solution  of  a  certain  Wiener-Hcof  integral  equa¬ 
tion.  If  the  solution  to  the  integral  equation  is  smooth  and  contains  at  most 


Dirac  delta  functions,  but  not  derivatives  of  delta  functions,  then 
asymptotically  optimal  sampling  designs  can  be  chosen  for  both  kinds  of 
estimators.  Otherwise  the  rate  of  convergence  of  the  optimal  coefficient 
estimator  is  not  known.  Fortunately  the  rate  of  convergence  of  simple 
coefficient  estimators  can  still  be  found  even  in  this  case,  but  their 
asymptotic  mean  square  estimation  error  is  not  the  least  possible,  even  though 
it  can  be  made  arbitrarily  close. 

In  Section  II  we  develop  the  general  set  up  and  solution  to  the  problem. 

In  Section  III  we  consider  in  more  detail  the  cases  of  additive  observation 
noise,  of  nonlinear  signal  distortion  plus  additive  noise,  and  of  quantization 
noise.  In  particular  we  see  that  while  additive  noise  of  comparable  smoothness 
with  the  signal  does  not  affect  the  rates  of  convergence  (but  does  affect  the 
asymptotic  constants),  quantization  noise  also  reduces  the  convergence  rate. 

Throughout  the  paper  we  consider  in  detail  the  case  of  random  processes 
with  no  quadratic  mean  derivative,  both  for  simplicity  of  exposition  and  be¬ 
cause  several  questions  remain  still  unresolved  when  quadratic  derivatives  of 
order  two  and  higher  exist  (the  case  of  only  one  quadratic  mean  derivative  be¬ 
ing  similar  to  that  of  no  quadratic  mean  derivative). 

In  this  paper  we  consider  only  two  kinds  of  (nonrandom)  sampling  design. 
They  generalize  periodic  sampling  which  includes  the  endpoints  a  and  b,  and  per¬ 
iodic  sampling  which  does  not  include  the  endpoints  but  is  symmetrically  spaced 
in  the  observation  interval.  Choose  a  continuous,  positive  probability  density 
h  on  the  interval  [a,b].  Regular  sampling  chooses  for  each  n  as  the  n  sampling 
points  Tp  =  (t^,...,  tnn)  all  (n-1)"^  percentiles  of  h: 


3 


and  we  refer  to  { Tn >  as  a  sequence  of  regular  sampling  designs  generated  by  the 
density  h.  Median  sampling  chooses  for  each  n  as  the  n  sampling  points  = 

{t  -|,...,t  )  the  medians  of  a  regular  sequence  of  designs: 


nk 


h(t)  dt  =  ,  k=l  ,2, . . . ,n. 


and  we  refer  to  {Tn}as  a  sequence  of  median  sampling  designs  generated  by  the 
density  h.  Uhen  h  is  the  uniform  density,  both  regular  and  median  sampling  be¬ 
come  periodic,  the  former  including  the  interval  endpoints  while  the  latter  does  not. 
We  will  use  the  following  notation  in  order  to  simplify  the  text.  With 
T  =  (t, ,...,t  )  an  n-point  subset  of  [a,b]  and  with  functions  f(t)  defined  on 
[a,b]  and  R(s,t)  defined  on  [a,b]*[a,b],  we  will  write  f!j  for  the  n-vector 

(f(t, ) , . . . ,f(t  ))  and  R_  for  the  nxn  matrix  {R( t - ,t.)}?  .  , .  We  also  frequently 
i  n  i  i  j  1 9  j  i 

delete  the  range  of  integration  as  well  as  the  argument,  writing  e.g.  //Rff 
for  jj^R(s,t)f ( s)f(t)ds  dt. 
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II.  GENERAL  CASE 


We  consider  the  problem  of  estimating  the  weighted  integral  of  a  random 


process  X  =  { X ( t ) ,  a  <  t  <  b} ; 


I  =  X(t)  f(t)dt 


from  "noisy"  observations  of  the  random  process  Y  =  { Y ( t) ,  a  <  t  <  b}  at  n  sample 
points  T  =  (tk)[!=-|.  The  processes  X  and  Y  are  assumed  to  have  continuous 
correlation  functions  Rx(s,t)  and  Ry(s,t)  and  cross  correlation  function  Ryx(s,t), 
and  the  weighting  function  f  is  assumed  to  be  continuous.  We  restrict  attention 
to  linear  estimates  with  weights  Cy  =  (Cy  ^,...,Cy  n): 

TT  =k^  CT,kY(tk)  =  CT  YT 
whose  mean  square  approximation  error  is 


where 


ey  =  E(I-Iy)2  =  a2  -  2CygT  +  c^RYJCy 


a2  =  El2  =  ff  Rxff, 


s(t)  =  j  Rx(t,u)  f(u)  du. 


(s  =  Rxf), 


g(t)  =  j  RyX(t,u)  f(u)  du, 


(g  =  Ryx^) 


If  the  observation  process  Y  could  be  observed  over  the  entire  interval 

o 

[a,b]  then  the  minimum  mean  square  approximation  error  would  be  achieved 

A 

by  the  projection  I  of  I  onto  the  linear  span  of  Y,  which  is  determined  by 


E[I  Y ( t ) ]  =  E [ I  Y(t)] 
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for  all  a  _<  t  £  b,  or  equivalently  by 

g(t)  =  (Ryxf)  (t)  =  E[I  Y{ t ) 3 - 

It  follows  that  g  belongs  to  the  reproducing  kernel  Hilbert  space  (RKHS)  R(Ry ) 
of  Y  and 

C*.EI2-e12.o*-||9|||(s)  .  (3) 

Of  course  when  noiseless  observations  are  available,  i.e.  when  Y(t)  =  X(t), 

2  2 
then  g  =  s  and  =  0.  In  the  general  case  we  always  have  >  0. 

Our  goal  is  to  choose  the  sampling  points  T  and  the  estimator  weights  c^ 

2 

in  such  a  way  that  the  resulting  mean  square  estimation  error  e^  should  be  as 
2 

close  to  em  as  possible. 


For  a  fixed  sample  T,  the  optimal  coefficients  c^  are  those  which  minimize 

2 

the  mean  square  approximation  error  eT  of  (1),  or  equivalently  those  which  make 
cjY.j.  the  projection  of  I  onto  the  linear  span  of  Y^.  They  are  given  by 

cy  =  9j  Ry^y  ,  and  thus  the  optimal  estimator  and  its  mean  square  aporoximation 
error  are 

't  =  9t  Ry!tVT  •  <«) 

€T  ■  °2  '  gT  Ry!t  9T  "  °2  •  I  I PT9 1  I  ZR(Ry)  ’ 
where  PTg  is  the  projection  of  g  to  the  subspace  of  R(Ry)  generated  by 
(Ry ( *  »t) ,  teT) . 
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The  optimal  sampling  design  of  size  n  (if  it  exists)  thus  maximizes 


! Pjg 1 1  p> ( p  )  over  sampling  designs  An  of  size  n:  T  =  (a  <  t^  < 


b) 


The  performance  of  the  optimal  designs  tends  to  ek  as  the  sample  size  tends  to 
infinit'',  si"ce 


i"f  c2  =  <,2  -  sup  I |PTgl ||(Ry)  -  °2  -I Igl li(R  )  =  i  ■ 

'n  ‘  n 

Optimal  sampling  designs  may  not  exist,  and  even  when  they  exist  it  may  be 
difficult  to  determine  them. 

We  now  consider  regular  sequences  { Tn }  of  sampling  designs  Tn  generated 

by  a  density  h,  and  write  I  and  e  for  I_  and  eT  .  As  the  sampling  size  n 

r,n  r,n  'n 

increases  they  satisfy 


1  1 9  -  PTq I  I 
n 


2 

R(Ry) 


Precise  rates  of  convergence  follow  from  the  work  of  Sacks  and  Ylvisaker  [9,10,11] 
in  certain  cases  where  the  observation  process  Y  has  exactly  k  quadratic  mean 
derivatives,  under  the  additional  assumption  that  the  function  g  of  (2),  which 
is  in  the  RKHS  of  Ry,  actually  belongs  to  the  smaller  space  r(Ry),  the  range  of 
the  integral  type  operator  with  kernel  Ry,  i.e. 


g(t)  = 


Ry  (t,u)d>(u)  du 


(g  =  Ry<f>) , 


(5) 


with  a  continuous  function.  Specifically,  under  certain  regularity  conditions. 


rb  oty  (t)<t>  (t) 

i  JLtk _ 


"2k+2<4  -2)  *  ck  - «. 

n  n 


'a  tr*2(t) 


(6) 


where  .*y^k  is  the  jump  along  the  diagonal  of  the  derivative  of  the  (k,k)  partial 
derivative  of  Ry,  «Y>k(t)  =  Ry,k+1 (t,t  -  0)  -  Ry ’k+1 (t,t  +  0)  >  0  (superscripts  denoting 
partial  derivatives).  The  regularity  conditions  are  specified  e.g.  in  [11]  and 
need  not  be  repeated  here;  it  should  be  noted,  however,  that  for  k  '2  this  result 


has  not  yet  been  established  for  as  broad  a  class  of  covariances  Ry  as  when  k  =  0,l 
(cf.  Section  6.1  in  [2]).  By  choosing  the  density  h*(t)  which  minimizes  the  right 
hand  side  of  (6),  i.e.  proportional  to  [ay  |c(t)<j>2(t)]1^2k+'^ ,  we  obtain  a  regular 
sequence  {T*}  of  sampling  designs  which  is  asymptotically  optimal,  i.e.  satisfies 


2 

r*,n 


inf  c. 

Tr  A 


-  1  , 
n 


and 


2k+2,  2  2x  .  , 

n  O  ;  V 


1 


[ay5k(t)<r(t)fK  °dt) 


The  regularity  conditions  are  satisfied  by  stationary  processes  with  rational 
spectral  densities,  the  stationary  process  with  triangular  covariance,  the 
Wiener  process,  etc.,  and  in  all  these  cases  the  jumps  of  the  covariance 

derivative  are  constant.  The  value  of  the  constant  is  k+2 1 / ( 2 k  +  2 ) ! 

where  3m  is  the  mth  Bernoulli  coefficient,  and  Cq  =  1/12,  C-j  =  1/720  (see  the  dis¬ 
cussion  in  [2,  p.  351]). 

For  simplicity  we  will  consider  from  now  on  examples  with  no  quadratic 
mean  derivative,  i.e.  k  =  0,  such  as  the  stationary  Gauss-Markov  process,  the 
stationary  process  with  triangular  covariance,  the  Wiener  process,  in  which  case 
we  have 


nV  -E2) 

r,n  <■' 


1 

T7 


h2(t) 


dt. 


(7) 


An  asymptotically  optimal  reqular  sequence  of  designs  { T* }  is  generated  by 
the  density  h*(t)  proportional  to  [ay  0( t)d^2( t) ] 1  /3 ,  with 


n2(-  2  -,2) 

v  r*n  - ' 


tp  f;  K  n(th2(tn1/3>3dt. 


(8) 
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The  VJiener-Hopf  Eg.  (5) 

From  (2)  and  (5)  it  follows  that  a  sufficient  condition  for  these  precise 
rates  is  the  existence  of  a  continuous  function  .j>  such  that 

fb  ,  b 

Rv(t,u)lj-(u)du  =  I  RVY(t,u)f(u)du  ( =q(  t ) ) ,  (9) 

'a  Y  'a  YX 

a<t<b,  or  Ry4>=Ry^f  (=9).  It  may  be  desirable  to  first  check  the  existence 
of  a  square  integrable  solution,  and  then  check  its  continuity.  In  this  connec¬ 
tion  it  is  of  interest  to  note  that  a  square  integrable  solution  <j>  of  the  l.'iener- 
Hopf  integral  equation  (9)  will  exist  for  every  square  integrable  f,  if  (and  only 
if)  any  of  the  following  equivalent  conditions  are  satisfied: 


(i)  r(RYX)  r(Ry), 

2  2 

(ii)  ||RyXe|!  •_  C | | Rye | |  for  all  square  integrable  functions  e 
and  some  finite  constant  C  (where  the  norm  is  L^), 


(iii)  for  some  finite  constant  C» 


C 


b 

Ry(t,u)RY(s,u)du  - 
a  T 


■b 

RYX(t,u)RYX(s,u)du 


is  a  nonnegative  definite  function  of  t,J5 


(iv)  the  minimum  mean  square  error  linear  estimate  I  of  I  based  on 
{ Y ( t ) ,  a  <_  t  <_  b)  is  of  the  form  (with  c  L^), 


I 


b 

Y(  t)d>( t)dt . 
a 


In  this  case  the  expression  of  the  asymptotic  mean  square  error  (3)  can  be  writ¬ 


ei=  URxff  -  j/v*- 


ten  as  follows: 


Simple  Coefficients 

Here  we  consider  the  very  simple  choice  of  (non-optimal )  coefficients  of 

the  following  form:  for  each  samole  of  size  n,  T  =  { t  ,  ) P  ,  ,  we  take 
3  n  nk  k=l 

cT  =  -  c(t  ,  ) 

Tn,k  "  nk 

for  some  continuous  function  c(t).  Thus  the  coefficient  cT  of  Y(t  ,) 

Tn,k  nk 

depends  only  on  the  sample  point  t  ,  via  an  appropriate  function  c,  unlike  the 

n  k 

optimal  coefficients  which  depend  on  the  entire  sample  T  . 

For  a  sequence  of  median  sampling  designs  { }  generated  by  the  density 


h  we  then  have 


I  =-  c  t  .  Y  t  ,  =  -  c;  YT  , 
m,n  n  ,, nk7  nk'  n  T  T 

k=l  n  n 


em,n  ■  E'1-'™  ,n>2 


=  !  !  RXff  "  n  CT  \  CT  RY,T  CT  ’ 
n  n  n  n  n  n 


and  by  Lemma  2  in  [4], 


1  im  e 


Rxff  -2  /  chg  +  /  /  RY(ch)(ch) 

Rxff  -  2  /  /  Ryx(ch)(f)  +  /  /  Ry(ch) (ch)  (10) 


=  E  (  /  Xf  -  /  Ych;  . 


A  reasonable  way  of  choosing  the  weighting  function  c  is  by  minimizing 

2 

the  limiting  value  of  the  mean  square  error  em  oo.  It  follows  from  (10)  that 

2  2  ~  ’ 
e,n  =  r,„  ^  and  only  if  /Ych  =  I  or  equivalently  (by  (iv)  in  the  precedinn 

subsection)  if  Ry4>  =  Ry^f  has  an  solution  <j>,  in  which  case  c  is  determined 

by 

c(t)h(t)  =  <j>(t) ,  a  <  t  <  b.  (11 ) 


Also  any  smoothness  requirements  imposed  on  c,  such  as  continuity  (which  has  al¬ 
ready  been  assumed)  or  twice  continuous  differentiability  (which  will  be  required 
shortly)  would  have  to  be  satisfied  as  well  by  the  solution  <p  to  the  Wiener- 
Hoof  integral  equation.  When  Ryj>  =  Ry^f  does  not  have  a  continuous  (resp.  twice 
continuously  differentiable)  or  even  an  L 2  solution  <j>,  then  one  can  find  a  con¬ 
tinuous  (resp.  twice  continuously  differentiable)  c  with  corresponding  mean  square 
2  2 

error  em  ^  exceeding  cm  by  an  arbitrarily  small  amount  (and  in  the  latter  case 
no  minimizing  function  c  exists).  This  is  because  random  variables  of  the 
form  /Yip  with  ip  continuous  (resp.  twice  continuously  differentiable)  form  a 
dense  set  in  the  linear  space  of  Y,  so  that  given  any  6 >  0  we  can  find  a  continu- 
ous  (resp.  twice  continuously  differentiable)  ip  such  that  E ( I  -  /Yip )  <6  and 
choosing  ch  =  ip  we  have 


em,„  '  E1/2(/Xf-/V»)2 

<  EI/z(/Xf  -  I)2  +  E/z(\-/Yip)2 

<  c  +6. 

OO 


We  are  thus  lead  to  consider  the  following  two  cases. 


Case  1 .  Ry4>  =  RyXf  has . 
Then  c  is  chosen  by  (11)  and 


1-2  solution  <p. 
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Assuming  the  observation  process  Y  has  no  quadratic  mean  derivatives  (k-0). 


Ry  satisfies  the  same  regularity  conditions  required  for  (7),  and  d>/h  is  twice 


continuously  differentiable,  we  obtain  from  [4,  p.  94], 


2,  2  2, 
n  (e  -  e  ) 
'  m,n 


1  fb  ay  Q(t)/(t) 


n  12, a 


h^7 


dt 


(13) 


2  1 

Thus  choosing  h*(t)  proportional  to  [«Y  q( t ) g>  (t)]  ,  we  obtain  a  sequence 


of  median  sampling  designs  {T*}  whose  corresponding  estimators  { 1^  n}are 


asymptotically  optimal.  Comparing  the  asymptotically  optimal  sequence  of 
estimators  using  median  sampling  and  nonoptimal  coefficients  with  the  asympto¬ 
tically  optimal  sequence  of  estimators  using  regular  sampling  and  optimal 
coefficients,  we  see  that  in  both  cases  the  design  is  determined  by  the  same 
density  h*(t),  while  the  estimator  coefficients  require  solving  an  inteqral 
equation  in  the  former  case  and  inverting  an  n*n  matrix  in  the  latter. 


Case  2.  RYg>  =  Ry^f  has  no  twice  continuously  differentiable  solution  <j>. 
In  this  case 


2  2  2 

e  „  em  >  e 
m,n  m,°°  » 


with  strict  inequality  for  all  choices  of  twice  continuously  differentiable  c. 
Thus  the  asymptotic  performance  is  always  inferior  to  that  achieved  by  using 


optimal  coefficients.  It  turns  out,  however,  that  the  rate  of  convergence  of 
2 


em  n  can  be  found,  while  under  the  present  conditions  no  rates  are  generally 
known  for  optimal  coefficient  estimators. 
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To  see  this  we  write 
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em,n 


=  { 
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'T  Y,T  T 


n  n 


If  Ry(ch) (ch) } 


-2{]-  c j  (Ryxf)T  -  If  RYX(ch)(f)}  . 
n  n 
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Assuming  as  in  Case  1  that  Y  has  no  quadratic  mean  derivative  (k  =  0),  that  c 
and  f/h  are  twice  continuously  differentiable,  that  Ry  satisfies  the  same  re¬ 
gularity  conditions  required  for  (7),  and  that  Ryx  satisfies  similar  regularity 
conditions,  we  show  in  Appendix  A  that 


n 2(e?  „  -  e!  J 


i?/^av.nc  "  evx  h  +  “  ^vchJ 


m,n "  Km,°°'  n  12JLUY,CT  pyX  h  '  YX 


(13) 


where 


6yx(t)  =  [Rvy°(t-,t)  -  Rvi’°(t+,t)]  -  [Rvy^t.t-)  -  R$y]  (t,t+)]. 


'YX 


'YX 


'YX 


'YX 


Ay  (t) 


h"1  (b)[c'(b)Ry(b,t)  +  c(b)R1y’°(b,t)]  -  h-1  (a)[c '(a)Ry(a,t)  +  c^R^Vt)]  ■ 


and  Ayx  likewise  with  Ry  replaced  by  Ryx>  It  should  be  noticed  that  Byx(t)  =0 

when  X  and  Y  are  jointly  stationary,  as  well  as  when  Ryx(t,s)  is  a  symmetric 

function  of  t  and  s;  the  latter  is  the  case  when  Y  =  X,  or  Y  =  X  +  independent  noise 

(see  Section  III. A),  or  even  when  Y  is  a  zero-memory  nonlinear  transformation  of 

X  (see  Section  III.C)  possibly  plus  independent  noise  (see  Section  III.B). 

As  has  been  pointed  out  in  the  paragraph  following  Eq.  (10),  in  this 

case  the  estimator  weights  c(t)  should  be  chosen  in  such  a  way  that  the  resulting 

2 

asymptotic  mean  square  error  em  ^  of  (10)  whould  be  close  to  its  minimum  value 
2 

em.  This  is  the  primary  consideration  in  choosing  c,  and  any  further  considera¬ 
tions  such  as  those  resulting  from  the  form  of  the  asymptotic  constant  in  (15) 
are  only  of  secondary  importance.  Since  the  asymptotic  constant  in  (15)  is  not 
necessarily  positive,  some  choices  of  c  may  produce  a  negative  value  indicating 


WtWjw  JUjS  '*< 
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2 

that  for  sufficiently  large  n  the  performance  would  be  at  least  as  good  as  em  oo, 

whose  value  would  perhaps  be  appreciably  larger  than  its  minimum.  Also  judicious 

choices  of  c  may  exist  which  render  the  asymptotic  constant  in  (15)  equal  to 

zero,  indicating  a  faster  rate  of  convergence  which  is  not  currently  known. 

However,  the  dependence  of  Ay,AyX  on  the  boundary  values  of  c  and  c  '  compl icates 

the  question  of  existence  of  such  c's,  and  more  significantly,  even  though  one 

can  derive  the  general  form  of  such  choices  of  c,  it  does  not  seem  feasible  to 

2 

determine  how  close  the  resulting  asymptotic  mean  square  error  em  ^  can  be  to 
2 

its  minimum  e  We  therefore  do  not  pursue  this  matter  any  further.  Incidentally, 
the  estimator  weight  which  minimizes  the  asymptotic  constant  in  (15)  is  given  by 

c(t)h(t)  =  2a  'StT  +  Ay(t)h2(t)} 

Y  ,0 

for  a  "t<b,  with  appropriately  determined  values  of  c(a),  c(b),  c'(a),  c'(b)  via  a 
system  of  four  linear  equations  resulting  from  the  dependence  of  Ay,AyX  on 
these,  and  with  corresponding  minimum  value  of  the  asymptotic  constant  in  (15) 

"  43  4^(0YX  h  +  AYh^  +  12  ^AYXh' 

Recall  that  in  the  noiseless  case  Y  =  X,  we  have  <)>  =  f  and  the  appropriate 
choice  of  c  is,  by  (11),  ch  =  f.  It  is  therefore  of  interest  to  determine  the 
best  constant  multiple  of  f  as  a  possible  value  of  ch.  Thus  putting 

c(t)h(t)  =  a f ( t )  ,  a  t  b,  ( |6) 

2 

in  the  expression  (10)  of  em  ^  we  find  that  the  value  of  A  which  minimizes 

.  2 
the  asymptotic  mean  square  error  e  is 

m  .oo 
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(18) 
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with  resulting  estimator  in  this  case  of  no  quadratic  mean  derivative 
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m,n 


>  n  f  ( 

•  A  r(  V 


minimum  asymptotic  mean  square  error  value 
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'm,o° 


ff Rxff  - 


(/fRYXff)‘ 


//  Ryff 


(19) 


and  asymptotics 


„V  -e2  ) 

'  m,n  m,®7 


(20) 


The  estimator  (18)  has  the  advantages  of  being  generally  applicable 
and  fairly  nonparametric ,  in  that  it  depends  on  the  correlation  functions  only 
via  the  integrals  in  (17).  In  sharp  contrast,  the  estimator  (12)  requires  the 
solution  of  the  integral  equation  and  thus  fairly  detailed  knowledge  of  the 
correlation  functions,  and  so  does  of  course  the  optimal  coefficients  estimator. 
Thus  the  estimator  (18)  is  more  robust  with  respect  to  inaccuracies  in  our 
knowledge  of  the  required  correlation  functions  than  the  other  two  estimators. 

It  can  be  used  for  its  simplicity  and  robustness,  instead  of  the  estimator  (12), 
even  when  the  integral  equation  has  a  twice  continuously  differentiable  solution, 
i.e.  in  Case  1,  and  in  this  case  one  would  want  to  know  how  much  performance  is 
is  lost  asymptotically  because  the  limiting  mean  square  error  e^  of  (19)  exceeds 
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l*- ,  i.e.  one  would  want  to  comoute 
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Observation  process  with  rational  spectral  densit 


When  the  observation  process  Y  is  stationary  and  has  rational  spectral 
density,  the  integral  equation  (5)  or  (9)  has  a  solution  which  contains 
delta  functions  and  their  derivatives  at  the  endpoints  of  the  observation 
interval  [8,Ch.  Ill,  Sect.  7].  In  particular,  when  Y  has  no  quadratic  mean 
derivatives  (k=0)  then  the  solution  contains  only  delta  functions  so  that 


g(t)  =  Ry(t-u)<f>0(u)du  +  ARy(t-a)  +  BRy(t-b) 


where  <J>Q  is  a  continuous  function  [8].  In  this  case  Sacks  and  Ylvisaker  [9] 
show  that  their  asymptotic  result  is  still  valid  with  <{>0  playing  the  role  of 


2/  2  2, 
n  (e  c  ) 
'  r,n  - 


Y  ,0 

TT 


a  h7" 


It  is  also  straightforward  to  check  that  if  we  adjust  the  simple  coefficients 
estimator  using  median  sampling  designs  {Tn>  by  adjoining  the  endpoints  with 
appropriate  weights  and  if  we  choose  c(t)  from  c(t)h(t)  =  <t>Q(t): 


Im  n  =  AY<a 

m,n 


)  +  BY(b)  *  Lj  l  2  ,  )  (22) 

k=1  h<V2,k> 


2  ~  p 

(n>2),  its  mean  square  error  e  =  E ( I - I  )  satisfies 

—  m,n  m,n 
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i.e.,  it  converges  to  (rather  than  to  the  larger  e^  J  and  with  the  same  rate 
2 

as  cr,n’  Provlded  Vh  is  twice  continuo:-JSly  differentiable.  Thus  in  this  case 


I 


the  modified  simple  coefficient  estimator  using  median  sampling  is  asymptotically 
optimal . 

In  closing  we  should  note  that  while  the  assumption  of  rational  spectral 
density  is  frequently  reasonable,  such  as  when  the  observation  is  signal  plus 
noise,  there  are  important  cases  where  it  is  unlikely  to  be  satisfied,  such  as 
those  considered  in  Parts  B  and  C  of  Section  III. 
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III.  SPECIAL  CASES  AND  EXAMPLES 


In  this  section  we  consider  several  special  cases  of  noisy  observations 


of  interest. 


A.  Independent  Signal  plus  Noise 


Suppose  that  Y(t)  =  X(t)  +  N(t),  a  <  t  <  b,  where  the  noise  process  N  is 


independent  of  the  signal  X  and  has  zero  mean.  In  this  case  RyX  =  Rx  and 
Ry  =  Rx  +  R^.  We  consider  in  more  detail  the  following  two  special  cases. 


Gauss-Markov  Liqnal  and  Noise 


Suppose  we  desire  to  estimate  the  average  of  a  Gauss-Markov  signal  over 


the  unit  interval,  when  observed  in  additive  independent  Gauss-Markov  noise. 


i  .e. , 


Rx(r)  =  ox  exp[-ax [t | ]  , 


M-0  =  ot  exp  [-ajx  |  ]  , 


f(t)  =  1 ,  0  <  t  <  1 . 


In  this  case 


-aYt  -aY(  1  -t) 


I  Oy  “3 yL  “3 y\ 

g(t)  =  Rx(t-u)  du  =  (2  -  e  -e  * 

J0  x 


0  <  t  <  1.  When  ax  +  aN  the  integral  equation  (5)  has  a  solution  <j>  containing 


delta  functions: 


6(t)  =  <|>0(t)  +  M  (6(t)  +  6 ( t- 1 ) } 


••  V  V  V  -.' 


(25) 


4*0 ( t )  =  M1  +  M2  {exp[-at]  +  exp[-a(l-t)]} 


and  the  values  of  the  constants  M,  ,  M2,  and  a>0  are  given  in  Appendix  B. 
The  optimal  mean  square  estimation  error  is  computed  from  (3), 
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'R(Rv) 


,  where 
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0  Jo 


R^(t-u)dt  du  =  (2a^/a^)(e  ^-1+a^)  =  r^ 


and  | |g| | 


2 

R(Ry) 


rl 

g(t)4>( t)dt  is  easily  calculated. 

Jo 


This  optimal  error  is  the  limiting  (large  sample  si2e)  value  of  the  mean 
square  estimation  error  when  using  optimal  coefficients  and  regular  sampling, 
cf.  (21),  or  the  adjusted  simple  coefficient  estimator  of  (22)  and  median 
sampling,  cf.  (23).  When  using  median  sampling  with  the  nonadjusted  simple 
coefficient  estimator  (18)  with  the  optimal  constant  X  of  (17),  the  limiting 
value  of  the  mean  square  estimation  error  of  (19)  is  given  in  this  case  by 


2  2 
rXrN 


2  2  2 

and  the  optimal  scaling  constant  by  X  =  rx  /  (rx  +  pn)-  Asymptotically, 

the  loss  of  performance  by  using  the  nonadjusted  versus  the  adjusted  simple 

2  ?  ? 

coefficient  estimator  can  be  measured  by  the  ratio  100(e  -  e  )  /e  which 

m,°°  oo#  '  oo 

is  plotted  in  Figure  1  for  a  range  of  parameter  values.  It  is  surprising 
that  over  a  very  large  range  of  parameter  values  this  ratio  does  not  exceed 
nine  percent,  indicating  only  a  moderate  loss  of  performance  when  the  simple 
coefficient  median  sampling  scheme  is  not  adjusted  by  including  the  interval 


9 


endpoints  with  appropriate  weights.  Only  as  a^/ax  approaches  zero  the  loss 
of  performance  becomes  large,  indicating  a  substantially  improved  performance 
of  the  adjusted  scheme. 

In  Figure  2  we  show  how  the  asymptotic  mean  square  error  decreases 
with  increasing  sample  size  n  in  the  following  cases  by  plotting  the  corre¬ 
sponding  asymptotic  expression  of  the  mean  square  error: 

(a)  no  noise,  uniform  sampling,  optimal  or  simple  coefficient  estimator: 

10  log1Q  {(ax/6)n'2}  (with  dash-dot  line)  (26) 

(b)  noise  present,  uniform  sampling,  optimal  or  adjusted  simple  coefficient 
estimator  (cf.  (21)  and  (23)): 

10  log10  fcf  +  (g-  (ax  +  aN);  «2)  n-2}  (with  solid  line),  (27) 

(c)  noise  present,  optimal  estimator  with  optimal  regular  sampling,  or 

simpler  coefficient  estimator  with  optimal  median  sampling  (cf.  (21)  and 

2/3 

(23)  with  optimal  h*  proportional  to  |<J>q|  '  ): 

10  loq1Q  {eM+  (g-(ax  r  aN)(/|40|2/3)3}n"2}  (with  solid  line), 

(d)  noise  present,  uniform  sampling,  nonadjusted  simple  coefficient  estimator 
(cf.  (20)) 

?  -2 

10  log,n  (e  +  Cn  }  (with  dotted  line), 
lu  m,® 

C  =  £  U(ax  +  aN)  -  (1  -  A)a2(l  -  e  X)  +  Xo2(l-e  N)}. 


.  -vv.s . 
*  *  "  **” 


, 

v', 


where 


20 

As  it  turns  out,  the  constant  term  of  4>q  is  much  larger  than  the 
term  for  the  values  of  the  parameters  we  plotted.  Hence  <j>q  is  nearly  always 
flat  and  the  optimal  sampling  density  h*  in  (c)  is  nearly  uniform,  i.e. 
(/I*/'3)3  nearly  equals  /<j>g,  and  in  our  plots  cases  (b)  and  (c)  were  apparently 
identical.  Note  also  that  the  expressions  we  plotted  for  n=2  to  21  are 
asymptotic  values  valid  only  for  large  n  (typically  n  _>  5) .  In  Figure  3  we  give 
some  plots  of  the  actual  mean  square  error  for  n  =  2  to  n  =  12,  for  the  case  of 
av=0w  =  a  =1  and  aw  =  2‘  In  this  fi9ure  we  also  Plot  one  of  the  asymptotic 
curves  (the  (b)  curve  of  the  previous  set)  in  order  to  compare  with  the  actual 
values.  As  can  be  seen,  the  actual  and  asymptotic  values  are  very  close  even 
for  n  =  5. 

It  is  of  interest  to  know  how  many  samples  are  required  asymptotically 

2 

t<  attain  a  given  performance,  say  for  error  (1  +  B)rm.  When  no  noise  is 
present  and  uniform  sampling  is  used  with  optimal  or  simpler  coefficient 
estimator,  expression  (26)  gives 

2  aX 

"NN'U  '  6(1  *  ' 


With  noise  present  and  uniform  sampling  with  optimal  or  adjusted  simpler 
coefficient  estimator,  expression  (27)  gives 


7he  ratio 


oo 


nN,U 

nNN,U 


{(1  +  ax/aN)  (1  +  l/p)  /  4>q  |1/2 


summarizes  the  effect  of  noise  upon  the  necessary  sampling  rate.  We  have 
plotted  this  ratio  in  Figure  4  for  some  representative  values  of  the 


parameters.  Again  as  aN/ax  approaches  zero  the  effect  of  noise  becomes  much 


more  marked. 
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Proportional  Signal  and  Noise  Correlations 


We  now  consider  the  case  where  =  yRx  for  some  positive  constant  y . 

In  this  case  Ry  =  (1  +  y)Rx  and  g  =  s  =  R^f  =  (1  +  yJ^Ryf  so  that  the  integral 
equation  always  has  a  continuous  solution  <j>  =  (1  +  y)~V.  We  also  have  aY  q  = 

(1  +  Y)aXj0  and 

ei  =  ff  Rxff  -  ff  Ryw  =  (1  — — )  //Rvff  =  rh 7  °2’ 


1  +  Y 


so  that 
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12(1  +  y) 


ax,of 


4  A 

1  +  Y 


and  similarly  for  e  ,  Provided  f/h  is  twice  continuously  differentiable.  Thus 

m,n 

the  asymptotically  optimal  sampling  density  h*  is  proportional  to  (ax  and 

thus  independent  of  the  noise  (i.e  of  y).  For  larqe  sample  size  n.,  both  an^ 

2 

>  „  are  approximately  eoual  to 
m,n  • 

rl—  2  +  4>  ■  (28) 

1  '  n 

In  Figure  5  we  have  plotted  this  expression  for  Gauss-flarkov  signal 
with  c?x  =  1,  ax  =  1,  for  f=l  (average  of  process)  and  h=l  (uniform  sampling), 
and  ax  q  =  2ax  =  2,  hence  =  1/6,  o'  -  /q/q  dtdu  =  2/e.  The  expression 

is  plotted  as  a  function  of  n  (=  1-20)  and  parametrized  by  y  with 


(a)  y  =  co 
)  Y  “  2 

(c)  y  =  1 

(d)  y  =  .5 


(circles) 
( squares) 
(pluses) 

( stars) 


(e)  y  =  0  (no  noise)  (lines)  . 


As  in  the  previous  case  it  is  of  interest  to  know  how  many  samples  are 

2 

needed  asymptotically  to  achieve  a  certain  mean  square  error  c  : 

y(l  +  y)~^a2  <  e2  <  a2.  It  follows  from  the  asymptotic  expression  (28)  of 

the  mean  square  error  that 


n  (Y)  = 


Y(o2  -  e2) 


and  with  no  noise  present 

2 

n  (0)  =  . 

e2 

The  ratio 

n(y)  _  _ e _ 

(e2  -  y(o2  -  c2))1/2 

increases  with  y  from  1  to  infinity  as  y  approaches  c2/( o2  -  c?).  The  first 
order  (linear)  approximations  in  y  are  for  the  weak  noise  case,  i.e.  y  -  0, 


(y) 

nW 


~  1 


+  i  (- 


1)  , 


and  for  the  comparable  noise  and  signal  case,  i.e.  y  -  1, 


B.  Nonlinear  Distortion  plus  Noise. 

Suppose  that  the  signal  X(t)  has  suffered  some  nonlinear  distortion, 
in  addition  to  being  corrupted  by  noise,  i.e.  that  the  observation  process 
Y(t)  is  of  the  form 

Y(t)  =  A(x;t))  +  N(t) 


where  X(t)  is  stationary  and  Gaussian  with  mean  zero,  A(-)  is  a  memoryless 

o 

nonlinearity  such  that  EA  (X(0))  <  and  N(t)  is  an  independent,  zero-mean, 
wide  sense  stationary  noise.  Then  Y{t)  is  wide  sense  stationary  and. 


assuming  for  simplicity  that  R^(0)  =  1 ,  we  can  write 

oo  a2 

Ry(t)  =  l  —  Ry(r)  +  Rn(t) 
Y  k=0  kl  A  M 


where  ak  =  E[A(X( t> )  Hk(X(t))]  and  fHk(x)\  are  the  Hermite  polynomials 
Also,  by  the  cross-covariance  property,  we  have 


Ryx(t)  =  d  Rx(t) 


where  d  =  E[A(X(t  ))x(t )]  (see  [1]).  Therefore  g  =  Ryxf  =  d  R^f  =  ds  and 


_  2  _  _2 


=  a*  -  d‘ 


|S| ! R(Ry)  ' 


It  should  be  noted  that  if  A  is  an  even  function  then  d=0  and  RyX=0, 
i.e.  the  observation  process  Y  is  orthogonal  to  the  signal  process  X,  and 
thus  no  linear  estimate  based  on  Y  is  better  that  zero  as  an  estimate  of  the 
integral  of  X.  We  therefore  assume  throughout  that  A  is  such  that  d^O 
(e.g.  an  odd  function) . 

When  optimal  estimator  coefficients  and  regular  sampling  are  used, 

o 

the  asymptotics  are  determined  by  (7)  with  rate  of  convergence  1/n  , 
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provided  the  integral  equation  Ry<J>  =  d  R^f  has  a  continuous  solution 
<j>t  and  Ry(t)  has  the  needed  differentiability  properties,  i.e.  a  finite 
positive  jump  -ay  ^  in  its  derivative  at  zero.  When  the  simple  estimator 
coefficients  and  median  sampling  are  used  then  the  rate  of  convergence  is 
1/n2  provided  Ry(i)  has  the  needed  differentiability  properties,  and  the 
precise  asymptotics  are  determined  by  (13)  when  the  integral  equation 
Ry$  =  dR^f  has  a  twice  continuously  differentiable  solution  <j>  and  the  estimator 
(12)  is  used,  and  by  (20)  when  the  estimator  (18)  is  used. 


Smooth  Limiters .  As 
A,(x)  ■= 


an  example,  let  us  consider  the  so-called  smooth  1  imiters 

X  o  o 

-»L  e"z/2v  dz 
v£rr  v  -*0 


for  which 


d  = 


2K 


/£tt  (1  +  V2) 


Ry(t)  =  ~-  sin-1 


Rx(t)  ] 


1  +  v‘ 


+  VT) 


It  is  easily  seen  that  if  R^  and  R^  have  the  needed  differentiability  pro¬ 
perties  to  define  a x  0  and  Q  then  so  does  Ry,  provided  v  >  0,  and  in  fact 


2K2a 


XY  ,0 


X,0 


it  /v(v  +  2) 


+  a, 


N,0. 


Also  a YX  q  =  dax  q.  Thus,  provided  v  >  0,  the  rates  of  convergence  remain 
2 

1/n  ,  as  in  the  absence  of  nonlinear  distortion. 

■  lard  Limiter.  The  case  of  a  hard  limiter 
A(x)  =  K  sgn(x)  =  AQ(x) 
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requires  special  attention  since,  assuming  Rx  and  RN  have  the  needed 

differentiability  properties  to  define  q  and  q,  it  follows  that 

2 

aY  0  =  +°°  inditing  3  r3te  °f  convergence  slower  than  1/n  .  This  is  because 


*»<*>  ■  “ 


sin"  (Rx(t))  +  Rn(t) 


implies  Ry(O-)  =  +°°,  Ry( 0+)  =  -00.  While  the  first  derivatives  of  Ry  at  0+ 
are  not  finite,  we  notice  that  the  "one-half  derivatives"  are.  Indeed  we  have 


R<1/2V)  a  lim 

1  TtO 


2K 

TT 


=  2k: 

7T 


2T 

IT 


Ry(0)  -  Ry(t) 


(-t) 


Trr~ 

rx{t) 


lim 

T+0 


lim 

T+0 


’  [1  -  r|(t)J1/2 

-  1  <-T)-’'2 


2  «x(t) 


1  -  R,(t) 


-T 


[1  +  Rx(t)] 


V2 


2  Rx(0-) 
(2RX(0-)}1/Z 


{2RX  (0-)}1/2 


I 


✓ 

V 

I 


f 

\  . 
V 


2K2  V2 

TT  aX,0 


Similarly 


r(V2)  (0+)  a  )1m  VT)  -  V0) 

T  T+0  TU 


2K2  a 1/2 
TT  X  ,0 


and  thus 


a  £  R(1/2)rO  )  -  R{1/2W)  -  liC  > 

aY,l/2  RY  10  '  KY  10  '  tt  aX,0 


The  expression  (14)  for  the  mean  square  error  when  the  simpler  coefficients 
are  used  becomes  in  this  case: 


em,n  "  em,°°  =  CT  RY,T  CT  “  YYRY(ch^ch^ 
n  n  n  n 


-2d{i  cl  (Ryf)T  -  //  Rv(ch)(f)} 
n  'n  A  'n  * 


A  (1st  term)  -  2d  (2nd  term) 


For  the  second  term  we  have  as  in  (15)  (cf.  Eq.  (  )),  n^(2nd  term)  -*■  finite 

constant,  provided  c  and  f/h  are  twice  continuously  differentiable.  The  rate 
of  convergence  of  the  first  term  can  he  found  in  a  similar  way.  Instead  of 

using,  say  for  r  >  0,  Ry(r)  =  Ry(0)  +rRy(0+)  +  o(r)  we  new  use 

r  (T)  =  Ry(0)  +  /F  rV/2)(0+)  +  o(/r)  (and  similarly  for  t  <  0). 

It  is  shown  in  Appendix  C  that,  provided  c  and  h  are  twice  continuously  differ 

entiable,  and  R have  finite  one-sided  second  derivatives  at  0,  we  have 


where 


n3/^2(lst  term)  -►  yay 


Y  =  3^5  '  16  ^2^  ~  -21225 


(30a) 


(30b) 


and  ;(3/2)  =  T^^k”3^2  =  2.612  is  Riemann's  zeta  function  evaluated  at  3/2.  T 
first  term  in  (29)  is  thus  the  dominant  one,  and  in  this  case  of  signal  and 
noise  with  no  quadratic  mean  derivative  we  have 


Hard  limiting  thus  reduces  the  rate  of  convergence  by  a  one  half  power  to  1/n 

2 

from  the  rate  of  1/n  of  soft  limiting  or  no  nonlinear  distortion.  If  ch  is 

chosen  proportional  to  f,  then  the  sampling  density  h*  which  minimizes  the 

4/5 

asymptotic  constant  is  proportional  to  |f|  . 


C.  Quantization 


In  applications  one  frequently  has  access  only  to  quantized  data.  Here 
we  assume  that  our  observation  process  is 

Y(t)  =  0(X(t))  A  0X(t) 

where  0  in  an  N-level  quantizer  with  Q(x)  =  y^  when  <  x  <  x^  where 

=  X,  <  y,  <  x2  <  y2  <...<  xN  <  yN  <  xNt1  =  and  xk  =  (yk  +  yk.,)/2  for 
k=2,...,N.  The  process  X(t)  is  stationary  and  Gaussian  with  mean  zero  and 
variance  one.  (More  generally  X(t)  can  be  taken  any  wide  sense  stationary 
process  whose  bivariate  densities  have  a  diagonal  expansion,  so  that  the  cross¬ 
covariance  property  holds  —  see  [1]). 

Optimal  Coefficients 

We  first  consider  the  case  where  the  optimal  coefficients  estimate  based  on 
the  quantized  samples  is  used: 

*Q,T  =  CQ,TQXT  =  9Q,TRQX,TQXr 

where  9q  =  Rqx  X^  =  dQRXf  =  dQ9,  9  =  RXf’  dQ  =  Denoting  as  usual  by 

A  A  A  1 

IT  the  optimal  coefficients  estimate  based  on  the  samples  X^  :  IT = c-j-X-j. -  gjR~ 1 yXT , 
we  can  write  the  mean  square  error  as 

eQ.T  ■  E<'  -  >0,/ 

-  E(I  -  iT)2  +  E(It  -  Iq_T)2  *  2E[(I  -  ?T)(tT  -  i0  Tn  . 


''  A 

The  third  term  vanishes  as  E[(I  -  Iy) Ij]  =  0,  since  IT  is  the  projection  of  I 
onto  the  linear  span  of  Xj,  and,  by  the  cross-covariance  property, 

EC(I  -  IT)IQ,T]  =  E[(/Xf  '  ^TXT}  "Q,TQXT] 


=  ZCQfT#k{/E[X(t)QX(tk))]f(t)dt  -  JcTJE[X(tj)gX(tk))j} 

=  dQ  XcQtTik{/E[X(t)X(tk)]f(t)dt  -  IcTJE[X(tj)X(tk)]} 
k  j 

=  dqE[(I  -  iT)CqjTXT]  =  0. 


Thus  the  mean  square  error  decomposes  into  two  components: 

2  ;  \2  A  2  2 

£q  y  E(I  “  ly)  ^  E(Iy  *  y)  ~  £y  £g  y  • 


The  first  term  is  due  to  sampling  and  has  been  discussed  in  Section  II;  its  ex- 


2  2  2 

pression  is  eT  =  o  -  ||P^g||R/^  y  The  second  term  is  due  to  quantization  of  the 

X 


samples  used  to  estimate  the  integral,  and  is  given  by 

2 


CQ,T  E^9TRx!tXT  "  dQ9TRQX,TQXT^ 


9TRx!t9T  "  dQ9TRQX,T9T 


=  HPTgll|(Rx)  ~  dQ'l  PT9 R(Rqx) 


where  the  cross-covariance  property  has  been  used, 
quence  {T  }  of  sampling  designs  we  have 


Thus  when  using  a  regular  se 


W»2-dQllV!l^X>  " 


dQllgllR(Rqx) 


A  2 
"  eQ.° 


(That  g£R(Rqx)  follows  from  dqg = gq e  R(Rqx) ,  cf.  statement  preceding  Eq.  (3)). 
It  follows  that 

eQ,r,n  "  eQ,=°=  dQ^g  "  PTg^R(Rqx) 

and  the  conjecture  here  is  that,  as  in  the  case  of  hard  limiter,  its  rate  of 

-3/2 

convergence  to  zero  is  n  ,  when  X  has  no  quadratic  mean  derivative  (k  =  0) 


at  the  origin,  and  where 


and  the  derivative  of  has  positive  jump  q 


Rqx$  *  R^f  has  a  continuous  solution  <p  (but  no  proof  is  currently  available). 


Simple  Coefficients 

Next  we  consider  the  case  where  the  simple  coefficients  are  used  along 
with  a  sequence  of  median  sampling  designs  generated  by  the  sampling  density  h$ 
The  resulting  estimate  is 


I - ■  l  I  c(tnik)QX(t.  J 


m,Q„n  n 


n,k 


with 


2  2 
em,Q,n  "  *m,Q,n^ 


"  ff  RXff  '  n  CTn(RQX,Xf)T  +  \  CT„  R0X,T  CT 


n  n  n 


n  n 


ff  Rxff  -  Iff  RQXjX(f)(c\)  +  //  R0X  Uhs)  (chs) 


E[  /  Xf  -  /  (OX) (ch$)]2  A  e2>m. 


(31) 


by  [4,  Lemma  2].  The  asymptotic  error  et  is  minimized  for  some  c,h  ,  if  and 

y »°°  S 


only  if  the  projection  of  1  =  / Xf  onto  the  span  of  QX  is  of  the  form  /QX -4>  for 
some  cp  e  L„  (  in  which  case  then  ch  =<|>),  or  equivalently  if  and  only  if  there 
is  an  solution  <p  to  the  integral  equation  dq* =  RqX4> •  From  (31)  it  is 


clear  that  as  N-*-®,  e^  -*•  E[/X(f  -  ch$)]^.  Thus  asymptotical ly  for  large 


number  *1  of  levels  of  quantization,  the  best  choice  for  c  and  h  is 


c(t)h$(t)  =  f(t)  ,  a  <  t  <  b. 


(32) 


Throughout  the  rest  of  this  subsection  we  assume  this  to  be  the  case,  so  that 


eg..  =  E  [  /  (X  -  QX) f  ]2  =  ff  Rx_oxff. 


Under  this  assumption  the  mean  square  error  is  again,  at  least 


3 


asymptotically  as  the  number  of  samples  n  tends  to  infinity,  the  sum  of  a 
mean  square  error  due  only  to  sampling  and  of  one  due  to  the  quantization 
of  the  samples.  Indeed  with 


1 


rm,n  "  n  j-  C^W  X^n,k^ 


we  obtain 

2 


e"  n  =  E(I  -  I  J2  +  E ( I  -  I  n  )2  +  2E  [(I  -  I  J(I  -  I  n  J]. 

m,0,n  m,n  m,n  m,Q,n  m,n  m,n  m,Q,n 


By  the  cross-covariance  property. 


E[( I  -  I  n)  I  n  J  =  dnE [ ( I  -  I  J  I  n] 
L  m,n  m,Q,n  0  m,n  m,n 


and 


ECU  “  J1™  J  =  1  C'  (RYf)T  -  V  C;  Ry  T  CT 

m»n  n  T  v  X  T_  2  T  X,T  T 


n  n  n  n  n 


//  Rx(f)  (ch$)  -  //  Rx  (chs)  (chs) 


=  0, 


by  (32).  Thus  the  cross  term  in  (33)  tends  to  0  with  n. 


2  2 

We  now  study  the  rate  of  convergence  of  em  g  n  to  e q  ^  as  n-**>,  assuming 


that  Rx  has  finite  one-sided  derivatives  at  0  required  for  the  definition  of 
q.  As  in  Section  II. B,  we  have 


e_  r>  _  ~  en  ff  RYff  -  2dn  c|  (RYf)T  +  — y  ci  R, 


"m,Q,n  0,~ 


Q  "  O  X  'Tn  7  Tn  QX.T 

n  n  n  n  n  n 


-//(Rx  -  2dQRx  +  RQX)  ff 


-1 


=  {  2  cj  RqX  T  CT  "  ff  R0Xff} 
n  n  n  n 


*2dn{^  c;  (Rvf)T  -  ff  Rxff } 


JQln  T  v  X  T 
A  n  n 


32 


=  (1st  term)  -  2dg(2nd  term). 


Assuming  that  f,h,f/h  are  twice  continuously  differentiable,  we  have  for  the 

2 

second  term  as  in  (15)  (cf.  Eq.  (51)),  n  (2nd  term)  -*■  finite  constant.  It  is 
shown  in  Appendix  D  that 


aQX,l/2 


(2ax,0/TT)  2Bq 


(34a) 


where 


i  n  2  ■ 

=  ~  l  (Vyk-1}  e 

ffr\  k=2  K  K  1 


■v8(yk+yk-i)' 


(34b) 


and  that  as  x  -*  0, 

M!/2rqx(t)  ■*§pqx>‘/2-  U5) 

This  case  generalizes  the  hard  limiter  case  considered  in  Section  II. B.  In 
fact  an  inspection  of  the  proof  of  (30)  in  Appendix  C,  shows  that  the  relation¬ 
ship  between  X  and  Y  affects  the  asymptotics  described  by  (30)  only  via  av  ^ 
and  54(b),  which  is  identical  with  (35)  above.  It  then  follows  from  Appendix  C, 
or  (30),  that  provided  Rx  has  finite  one-sided  second  derivatives  at  0,  we  have 


3/ 

"  (’St  tern)  -Y — 
with  y  given  in  (30b).  We  thus  have 


-  1'(2oX,0/",Y2Vr>J 


Van  rJL 


(36) 


Quantization  therefore  reduces  the  rate  of  convergence  by  a  one-half  power  to 

_3/ 

n  ,  independently  of  the  number  of  quantization  levels;  and  the  optimal 

sampling  density  h*,  which  minimizes  the  right  hand  side,  is  proportional  to 
i  f  1 4A 


It  is  very  fortunate  that  in  (36)  the  effects  of  quantization  and  of 


sampling  are  coupled  in  a  rather  simple  way.  For  large  n: 


This  can  be  used  to  study  the  interplay  between  the  number  N  of  quantization 

levels  and  the  number  n  of  samples  used,  asymptotically  as  N  and  n  tend  to 

infinity.  Suppose  that  a  "regular"  sequence  of  quantizers  {Q^}  based  on  a 

continuous  density  hn  is  used,  i.e.  the  levels  y..  ,  <  y..  ~  <...<  yM  of 

*•<  N  9 1  N  9  c.  N ,  IN 

quantizer  QN  are,  respectively,  the  1/(2N),  3/(2N),  ...  ,  (2N  -  1)/(2N) 

quantiles  of  hq.  With  p(x)  denoting  the  standard  normal  density  of  X(t), 

1  /3 

choosing  hn  proportional  to  [p( x ) ]  ,  i.e. 

.  x2 

hn(x)  =  — - —  e  ,  -»  <  x  <  °°, 

g  s&r 


we  obtain  an  asymptotically  optimal  sequence  of  quantizers.  As  in  [3],  with 


1 

N 


N,k 


yN,k-l 


hq(x)  dx  =  hq(zN>|<)  (yNj|<  -  yN>k_-,) 


-1  • 


for  some  y^  <  2n  k  <  yN  k’  we  °btain  (assuming  phq  is  Riemann  integrable 
over  (-ri,co)), 


1 


NAn 


N  8  VJ,N,k  ^N.k-l) 


N  SB  k=2 


hQ(zn,k> 


*yN,k  "  yN,k-l* 


-i(2x)2 


N  SI tT  >  -oo  hg(x) 


dx  = 


P(x) 


hg(x) 


dx 


When  h?  is  used  the  asymptotic  constant  becomes 


Thus  for  large  N, 


'N 


l 

N 


_P_ 

h. 


The  precise  rate  of  convergence  to  zero  of  e«  seems  harder  to  determine 

V00 


However  we  have  the  following  bound,  assuming  /  phq  < 
b 


'V 


=  E  { 


[X(t)  -  QNX(t)]  f(t)  dt 


E[X(t)  -  QNX(t)]  [X(s)  -  QnX(s)]  f(t)  f(s)  dt  ds 


<  E[X(t)  -  Qw(X(t))]2  (  |f|)2  , 


and  by  [3], 


N^eq  £  M2E[X(t)  -  QN(X(t))]2  (J  |f|2 


hp  'a 


Again,  when  h*  is  used  we  have 

Q 


p(x) 


dx  =  6/3 


71  . 


-°°  0*  (x)]‘ 


Thus  for  large  N, 


From  (37),  (38)  and  (39)  we  see  that  for  large  N  and  n 


The  bound  (40)  can  be  used  to  determine  the  allocation  to  quantization  levels 
N  and  sample  size  n  to  achieve  mean  square  error  not  exceeding  a  desirable 
value  e2.  For  instance  if  we  choose  N  =  n^  the  required  numbers  of  quanti¬ 
zation  levels  and  samples  are  N  =  (C  +  D)^e'\  n  =  (C  +  Also  the 

more  interesting  problem  of  minimizing  the  total  number  of  samples  N  +  n  (or 
some  other  function  of  N  and  n  reflecting  quantization  and  sampling  costs) 
subject  to  mean  square  error  not  exceeding  a  desirable  value  e2  can  be  solved 
(numerically  -  analytic  expressions  being  hard  to  obtain). 


APPENDIX 


A.  Proof  of  (15) 

Here  we  find  the  rate  of  convergence  and  the  corresponding  asymptotic 
constants  of  the  two  terms  on  the  right  hand  side  of  (14).  These  results  cor 
rect  (and  in  fact  the  second  also  extends)  the  statements  in  [4]  displayed  be 
tween  Eqs.  (3.36)  and  (3.37);  the  final  result  (3.37)  in  [4]  remains  correct. 
For  simplicity  of  exposition  throughout  the  following  we  do  not  display  the 
terms  of  higher  order  and  we  use  -  to  indicate  equality  up  to  higher  order 
terms. 

Sampling  points  and  subdivision  points 

t  . 

The  sampling  points  t  .  are  determined  by  /  n,Kh  =  (k-V2)/n,  k=l,...,  r 

n ,  K  3  $ 

Introduce  the  interval  subdivision  points  sn+^  k  by  /an+^ ’^h =  (k  -  1 )/n, 
k  =  l,...,  n  +  1,  so  that  each  tn  k  is  the  median  of  (sn+1  k>sn+]  k+-| )  with  re¬ 
spect  to  h.  For  notational  simplicity  we  drop  the  subscript  n  from  t  .  and 

n  5  k 

n  +  1  from  s  ,,  ,  . 

n+ 1  $  k 

3y  the  mean  value  theorem 


J  h  =  h(“k)(sk+r  skK 


~  =  /  h  =  h(ak)(tk  -  sk). 


2n  J  h  =  h^bk^sk+l  "  V’ 


where  s^  <  wk  <  sk+i >  sk  <  ak  <  ^k  <  bk  <  Sk+1 '  follows  that  as  n-»>, 


3 


so  that  asymptotically  as  n-*-°°,  tk  is  the  midpoint  of  (sk+1>sk).  We  will  need 
the  order  of  magnitude  of  the  difference  of  the  two  pieces 


Dk  ^sk+l  "  V  "  (tk  '  sk^ 


Substituting  h ( u )  =  h(tk)  +  (u  -  tk)h>(int.  pt.)  under  the  integral  in  (41b)  and 
(41c),  using  the  mean  value  theorem  and  subtracting  the  resulting  expressions 


we  obtain 


i  Mb’)  Ma')  1 

°k  =  ‘7^1 +  } '  ^ 


The  first  term 

We  first  consider  the  first  term  on  the  right  hand  side  of  (14),  which 
can  be  written  as  follows  with  K(u,v)  =  c(u)Ry(u,v)c(v) ,  a  symmetric  function, 

a  i  b 

1st  term  =  —  X .c(tk)Ry(tk,t  .)c(tj)  -  //  Ry(u,v)ch(u)ch(v)dudv 
n“  k,j  a 

Sk+1  Sj+1  A 

=  l  J  J  [K(t.  ,t.)  -  K(u,v)]h(u)h(v)dudv  =  l  L  .. 
k,j  sk  Sj  k  J  h,j  k’J 


The  integral  over  each  diagonal  square  is  split  into  the  following  six 
regions,  and  by  symmetry  it  suffices  to  consider  only  regions  1,2  and  3. 


We  use  the  Taylor  expansion  of  the  integrand: 


K(tk,tK)  -  K{u,v)  =  K{tk,tK)  -  K(tk,v)  -  (u  -  tk)K] ,0(xk,v) 

=  -(v-tk)K°’1(tk,yk)-  (u-tk)K1’°(xk,v) 

where  xk  is  between  u  and  tk>  and  yk  is  between  v  and  tk-  Over  each  region 
both  u  -  tk  and  v  -  tk  have  constant  sign  so  the  mean  value  theorem  is  applica 
ble.  For  region  1  we  have 

JJ  =  K0,1 (ak,bk)h(ak)h(bk)//  -  (v  -  tk)dudv 
]k  ]k 

+  K1 ’°(a^,bk)h(a^)h(bp//  -  (u-tk)dudv 

]k 

-  {^0,1(ak,bk)h(ak)h(bk)  +  lK1’0(a',b|^)h(a')h(b^)}(tk-sk)3 


where  the  points  (ak>bk)  and  (ak>bk)  belong  to  lk-  Using  (41a)  and  (42)  we 
obtain 


2  ini  h(a,  )h(b.  )  ■.  -i  q  h(ak)h(b/)  t.  -  s.  o 

-L-Y  rf  _  V  r  I  'I  k  >  K  K  .  K  _ K  1  /  K _ k  \  -j 


n  Ij7  =  1^3*  ’  (ak>bk) — 2 

k  K  k  k  k  fT(w.  ) 


h  (wk) 


■)(r^r)  (sk+i 

sk+l  sk  K  1 


n  +  4]j^  *^(t+,t)}dt. 

9 


A  similar  argument  gives 


II  p  (t,t-)  -  ’^(t+,t)}dt, 

k  2k  n  a  lb 

H2 1  IS  ^  /{•  T^’^t.t-)  -  JjK1 ,0(t+,t) }dt. 


Putting  these  together,  and  using  the  symmetry  of  K,  we  have  for  the  diagonal 


terms 


n^k,k  n*  |/{K0J(t,t-)  -  K°J(t,t+)}dt 

k  3 


=  ^JaY>0(t)c2(t)dt. 


Over  each  nondiagonal  rectangle  Mj,  K  has  continuous  partial  deriva¬ 
tives.  Thus  Taylor  expanding  K  and  retaining  only  the  lowest  degree  terms 
we  obtain 


.  sk+l  sj+l 

i  =  l  ^r5TKp’q  Ul**,-)  J  J  (u  -  t.  )p(v  -  t.)qh(u)h(v)dudv 
k,J  0<p+q<2  P*q-  K  J  Sk  S.  K  J 


where  in  fact  the  term  with  p  =  1  =  q  is  of  higher  order.  Using 
h(u)  =  h(t.  )  +  (u  -  t.  Jh^int.  pt.)  and  (43)  we  find 


A  V  1  h'<bk>  hX>  l  hX)  1 

Ak  -  J  (u-tk)h(u)du  *  {-  Tgt-T— r  +  72777]  +  T2  X7T}(sk+i  _skb 


10  h^bk)  h  (ak)  “  h  (ak) 

1  h'(tk}(  xl 

24  h2(tk)(  k+1  kV 


where  sk  <  c  sk+i »  an(1  likewise 

A  k+^  2  1  1 

Bk  [  (u  "  tk)  h(u)du  =  ^2  2,  \ ^sk+l  "  sk^ 


h  (ak) 


’k+1  ‘  V~7 


l(sk+l  -  sk> 


It  then  follows  (using  (41a))  that 

10  lb^bk^  01  i  h  (t.) 

!k,j  =  ‘{K  ’  ^kftj^“  24)h2(t  jh^wj^  +  K  ’  (tk,1:j)h(wk)("  24^2^ 


+  y,0(tL,tJ1o  t  h(w.)  +  ^^UtJhtwJ  Owts,,  -  sJ(SiJ1  - 


and  thus  by  the  symmetry  of  K, 


"2k^k,J  n  K°’'(t>s)5^M-K°-2(t,s)^})dtds 

=  A  /dth(t)(/  +  /)ds  ^t-  K  h(i)’Si] 


a  3 


=  "  lVaY,0c2  "  lVAYch 


where  Ay  is  defined  following  Eq.  (15). 

Finally  adding  (44)  and  (47)  we  obtain 


1  9  1 

n*-[lst  term]  ^  0^  ~  72  ^  Ayc^* 


The  second  term 

We  now  consider  the  second  term  in  (14): 

.  ,  b  b 

2nd  term  =  -  £c(tk)/Ryx(tk,v)f (v)dv  -  //  Ryx(u,v)ch(u)f (u)dv 

k  cl  d 

Vi  sj+i 

“If  /  [M(t. ,v)  -  M(u,v)]h(u)h(v)dudv 
k,j  sk  s. 


-  I  \  •* 

kj  k,J 


where  M(u,v)  =  c(u)Ryx(u,v)f (v)/h(v)  is  a  generally  nonsymmetric  function. 

The  integral  k  over  each  diagonal  square  is  split  into  the  six  regions 
lk  to  6k-  Over  region  lk>  using  the  Taylor  expansion  M(tk,v)  -  M(u,v)  = 

-(u  -  t.  )m"*  ’^(x.  ,v),  s,  <  v  <  x.  <  t.  ,  and  the  mean  value  theorem  we  obtain 


//  =  M1 ,0(ak,bk)h(ak)h(bk)//  -  (u  -  tk)dudv 


=  lM’-0(ak.bk)hUk)l>(bk)(tk-sk)3 


where  sk<bk<ak<tk,  and  thus  by  (41b)  and  (42), 


n2v  ff  _  j_  Mi,o,  .  'h(bk)  lk '  sk  u  .  ) 
F{k'24k  (Vbk>hTa^y  5k+,-sk  (sk+l-sk> 

n  48  ^  *  (t+,t)dt. 


A  similar  argument  gives 


n2I  II 


n  -  T6  /M1 *°(t+,t)dt. 


n  Cll!  *  -^/M1’°(t-,t)dt, 


//  iV  /M1 •°(t-,t)dt. 
k  5k  a 

The  integrals  over  regions  3k  and  6k  are  slightly  more  complex.  For  region 
3k  we  use  the  Taylor  expansion 

M(tk,v)  -  M(u,v)  =  M(tR,tk)  +  (v-  tk)[M0,1  (tk,xk)  -M0,1(u,xk)j 

=  -(u  -  tk)M]  ’°(yk»tk)  +  (v  -  tk)[M0,1(tk,xk)  -M0,1(u 
where  tk  xk»yk < u <  xk+] >  and  applying  similarly  the  mean  value  theorem  we  obtain 


n2n /  t  /{- 4«'-0(t+,t)+3kH0>,(t,tw-«0’,(t.t-)])dt. 


Likewise 


n2I  //  5 
k  6k  a 


Finally  putting  all  six  pieces  together  we  have  for  the  diagonal  terms 

o  b 


n‘‘H,k  n  "  2^M0,1(t»t-) -r,0J(t,t+)j}dt.  (49) 


Over  each  nondiagonal  rectangle  M  j,  M  has  continuous  partial  derivatives. 
Thus  Taylor  expanding  M (t^.v)  -  M(u,v)  into 


-(u  -  tk)M]  ,0(tk,tj.)  -  ^(u  ~  tk)2M2*°(tk,tJ- )  -  (u  -  tk)(v-  tj)M1,1(tk,t:j) 


plus  third  order  terms,  and  retaining  only  the  lowest  deqree  terms  (M1 ,0  and 
M2’^)  we  obtain,  using  (45)  and  (46) » 


Jk,j 


,1,0 


Sk+1  SU1 


“  -M  ’  (tk,t ^ )  /  J  (u  -  tk)h(u)h(v)dudv 


sk  sj 


sk+l  Sj+1 


2’vl2’0(tk,tj.)  /  /  (u  -  tk)2h(u)h(v)dudv 


Sk  sj 


and  thus 


110  ^ )  ?  n  b(w- ) 

24{m  ’  (tk’tj)7^l(wJ)  ’  M  *  (tk’tj)R^T}(sk+Tsk)(sj+i 


2  l  Jk  i  -  2V  //tM1,0(t,s)h'^t)h(s)  -  M2’°(t,s)£jfj}dtds 
Mj  K’J  n  ^  its  ti  (t)  (50) 


=  -  ^/[M1  *°(s-,s)  -  M1  •V.sHds  - 


24; 1  h(b) 


a  -  a 

where  the  last  equality  is  obtained  integrating  by  parts  as  in  (47). 
Finally,  adding  (49)  and  (50),  we  find 


h(a) 


}  h(s)d 


n  [2nd  term] 


-  24  /([M  (t-,t)  -  M  *  (t+,t)3  -  [M  ’  (t,t-) 


1  ffM1,0(b,t)  M1’0(a,t),  hftxdt 

24  J 1  h(b)  hTal  h^'dt 

3 


W 


2V3YXCh  "  24^AYXf 


where  8yX  and  Ayx  are  defined  following  Eq.  (15). 
Now  (15)  follows  from  (14),  (48)  and  (51). 


-  M  ’  (t,t+)]}dt 

(51) 
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B.  Expressions  for  the  constants  in  (241  and  (25). 


Using  the  method  of  solving  the  integral  equation  (5)  described  in 

[8,  Ch.  Ill,  Sect.  7]  we  find 

2  2 

2  .  ,  °XaN  °NaX 

1  "  xaN  2  7  2a  ’ 

°X  X  N  N 


M  _  °XaN 
^1  ~2  '  9 

°XaN  +  Vx 


^  =  °xaxaNar/qx  *  aN^ax  ~  V 

(aXaX  +  aNaN^°XaN  +  aNaX^  S 


2  2  -a 

°X°N^aN  ‘aX)a(1  ®  ) 

*°XaN  +  °nV  S 


i/here 


>  =  a(l  -  e  ) (°xax  +  aNaN)  +  (1  +  e  ^xaN^X  gfP  ’ 


C.  Proof  of  (30) 

With  the  notation  introduced  in  Appendix  A  we  have  (writing  R  for  Ry) 

,  b 

1st  term  =  4  V  c(t,  )R(t.  -  t . )c(t . )  -  //R(u  -  v)ch(u)ch(v)dudv 
r\  k,j  K  K  J  J  a 

sk+l  sj+l 

=  11  J  [R^-t^yc^J-RCu-vJc^cCv^KuJhCvJdudv 

k,;i  sk  sj 


4 


(52) 


We  first  concentrate  on  the  diagonal  terms  Ik  k  and  split  each  square 
(Wl^VW  int0  itS  upper  ancl  lower  triangles.  Then 


If  [R(0)c2(tk)  -  R(u  -  v)c(u)c(v)]h(u)h(v)dudv 
lower 
triangle 

=  //  [R(0){c2(tj  -  c(u)c(v)} -R  2(0+)/tTvc(u)c(v)  +  o(/u^V)c(u)c(v)]h(u)h(v)dudv 

lower 
triangle 

k  -R^2^(0+)  //  /u-v  ch(u)ch(v)dudv 

lower 
triangle 

=  -R^2  ^  (0+)ch(u.  )ch(v.)  II  /u-v  dudv 
K  1 ower 

triangle 

=  ~  T5  ^  /2^(°+)ch(uk)ch(vk)(sk+i  '  sk)  /z 


for  some  (uk,vk)  in  the  lower  triangle,  where  =  indicates  equality  up  to  higher 
order  terms.  Similarly,  the  integral  over  the  upper  triangle  has  main  term 


4fi(1/2)(0-)ch(ak)ch(bk)(sk+1  -sk)V2 


for  some  (a^.b^)  in  the  upper  triangle.  Using  (41a)  we  then  obtain 
n’/!IIkjk  -  ^  [R(1/2,(0-)  -  R(‘/2)(0+)]/i^  . 


For  the  nondiagonal  terms  Mj,  Taylor-expanding  R  and  c  we  obtain, 
writing  ck  for  c(tk),  ck  for  c'(tk),  etc.. 


!k.j  "  "R^k’ Vtckcj  lf+ckcj  n  +  2°^  ~n~  +  ?°kcj  n  +  CkcjAkAj} 

A.  A.  B.  B. 

‘  R  ^k  "  ^ckcj  ^n”  "n^  +  ckcj^n'AkAj^  +  ckcj^AkAj  "  ~n^ 


B,  B. 


+  higher  order  terms. 


where  Ak,Bk  are  given  in  (45), (46).  All  terms,  except  for  those  retained  below 
_? 

have  rate  n  ,  so  that  the  dominant  term  is 


iA.j”-KLR'(tk'to)c(Vc(tj,(T^> 


and  by  (46)  and  (41a), 


l  !k  i  *  *  —4  l  R~Uk  -  t.)c(tk)C(t.){  J -  +  -y1— }. 

K’J  24  k^j  K  0  k  J  h^(t.)  ti  (t .) 


-2  . 


The  double  sum  above  over  all  terms  with  |t.  -  t.|  >e  has  rate  n"  ,  in  fact 

x  J 


n2|t„-tj*IkJ  * "  ®  |t^i>ent's)c(t>c(sH^t^lldtds 


and  the  limiting  constant  is  finite.  On  the  other  hand  the  same  argument 
2 

shows  that  n  t  ,  I.  .  tends  to  the  same  integral  as  above  but  over 
1  k  j1  e  K,J 

1 1  -  s  |  <  e ,  which  is  not  finite,  as  integration  by  parts  and  R'(0+)=°°  show. 
Thus  its  rate  is  slower  and  it  becomes  the  dominant  term: 


l  ^  i  = - ~a  l  R“(tk  -  t.)c(tk)c(t,){-J —  +  — 

’■  k’J  0/1  n4  I  tk-tJ- 1 <e  k  J  k  3  fT(tk)  h^(tj) 


Mj 


24 


} 


for  each  e  >  0. 

Differentiating  the  expression  of  R  in  terms  of  Rx  and  R^  we  obtain 
2j<2  Rx(t)[Rx(t)]2  Rx(t) 


R~(t)  =  {  a  a  +  — a, — IT}  +  RU^ 

*  [i-rj(t )i/2  n-Rx(T)ra  N 


Assuming  that  Rx(0±)  and  R^(0±)  exist  and  are  finite,  it  follows  from 


1 


1-R2(t)  t+0  aX,0 


that 


FT(t) 


T->0 


2IC 

7T 


a 


v2 

x,o  1 


a^v  1/ 


Thus  for  very  small  e  (e  s  0), 


R~(t,-t.)  -  ixv  , 


1 


k  fcj'  "  3^Y ,  V2  77  T?K  ' 
1  tk'jl 

Also  for  e  «  0  and,  say,  k>j,  we  have 


1 

n 


Li+1 
/  h 
t. 


h(t:)Ati 


h(tj)At1 


for  i=j,...,  k,  and  thus  summing  ud 


and 


n  =  h(tj)(tk'tj)’ 

1  h3/2(t.)nV2 

R  -  gay>1/2  {k_.)3/2 


Likewise,  when  j  >  k. 


,  h3/=(t.  )nV2 

R  (V  V  “  3°Y,‘A  "^’A 


Using  also  c(tk)  =  c(tj),  h(tk)  =  h(tj ) ,  we  obtain  that  for  e«  0, 

r  r  1  ,  r  1  c2(V 

k^j  k,J  96nS/2  aY,1/2  0<tk-t  —  < ^  ^ 


c2(tk) 


j<£  (k-j)/2  ir(tV)  0<tj-tk<e  (j-k)3/z  h  /z  (tk) 


As  the  sum  extends  over  0  <  k  -  j  <  nh(t^ )e  and  0  <  j  -  k  <  nh(tk)e  we  obtain  in 
the  limit,  in  view  of  (41a), 


n’/2 1  \  j 

K’J 


=  -  ^aYjy2d3/2)Jc2h  /z. 


(5 


These  arguments  for  t  z  0  can  be  made  precise  by  writing  upper  and  lower 
bounds  in  terms  of  e,  and  then  letting  eiO  as  these  hold  for  all  e>0. 


Combining  (53)  with  (55)  gives  (30). 
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D.  Proof  of  (34)  and  (35) 


We  have 


RqX(t)  =  ELQ(X(0)  Q(X(t) )]  =  //  Q(x)  Q(y)  p(x,y;  Rx(i))  dx  dy 

-oo 

where  p(x,y;p)  is  the  bivariate  normal  density  with  zero  means,  variances  one, 
and  correlation  coefficient  p.  Thus,  by  Price's  theorem  [7],  we  obtain 

R0X(t)  =  ~3R^  *  RX(x) 


=  Rx(t)  //  p(x,y;  Rx(x))dQ(x)  dQ(y) 

”  R’<(T>kj=2  p(xk-xj;Rx(T))  (vk  -  Vi1  <vj  -  yj-i* 


Rx(t) 


2 

I  (yk  -  yk  i)  exp  [ - 

<=2  K  K_l  1  + 


k=2  1  +  Rx(t) 


\]ij,2(yk-*k-i)<Y)'j-i>exp[- 


Xk  +  xj  '  2xkxjRX(T) 


20  -  RJ(t)) 


If  follows  that 


2>  Rqx(t)  -  W0) 

c  n+^  =  iim  _ _ JM 


U  (0+)  =  lim  - 

U*  x+0  fT 


Rx(0+) 


=  lim  2/t-  R«y(t) 
t+0 


i{-2Ry'(0t))‘/!  k=2 


L  <yk- 


<Vk-Vk,»2e',yk+y-,)  * 


Lv»y 


=  -  {-R^(0+)/tt}  /2Bg  =  -  {axy(27r)}/230. 


Similarly 


(l) 


Rn?  (0-)  =  {av  0/(2tt)}/2  B 


'QX 


X,0' 


and  thus 


a, 


QX.1/2 


(?) 

Rox  (0-> 


(i>  V 

Vx  <°*>  ■  l2“x,o'">  V 


This  establishes  (34).  For  (35),  differentiating  once  the  expression 


above  for  RqX(t),  we  obtain 


!  RJ(t)  _  Rx(t)[Rx(t)]2  N 


"®<T)  =  +  D^(t)]v,  Hkyvk  -  VP  «»[-  hrtct] 


+  I  (yk-yk_1)(yi  -y,-  Jexp[-  -  - ]  } 

Mj=2  k  k  1  J  J  1  2(1-RJ(t)) 


Ry'(l)  n 


2  r  xk  -|  xkRX^T  ^ 


+  - - - 17  (  I  (yk  -  y.  , )  exo[-  -lTp--r ■  t 

2tt  [  1  -  Ry  (  t  )  3  k=2  k  k_1  l+Ry(x) 


Xu'  [1+Rx(t)]‘ 


N 


O  o 


x''+x%2x,  x.R  (t)  d'(t)  0  ,  ? 

*  l  iv'V]fc"i.i)eMi--;]-?',  J  j[(xr+x-;>RjT)-x  x ,(hr‘(t))]  j 

kXj=2  K  k  1  -  1  '  2  1-Rj  t  I-RJ(t)  J  y  k  J  x 


and  using  (54a)  we  have  as  x  ->  0, 

|t|3V  (x)  +  1  (aX,0x2(  1  xV2  5  ,  \2  ~xk 

11  RQXlx)  2tt '  2  ]  V  J  ,  Vyk  yk-l}  e 


"x?/2 


*X,0  k=2 


4^ax,o/ ^2tt) )  /2bq  if'qx , y2 


establishing  (35). 
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Fig.  2.  Plot  of  asymptotic  mse  (in  dB)  vs.  number  of  samples  n  (n  :  2-21) 


No  noise  case  -  dash-dot 

Noise  present  simple  coefficients-  solid  line 
Noise  present  optimal  coefficients-  solid  line 
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Fig.  3.  Plots  of  mean  square  error  for  the  following  schemes 


Uniform  sampling,  optimal  coefficients 
Uniform  sampling,  adjusted  simple  coefficients 
Optimal  regular  sampling,  optimal  coefficients 
Optimal  median  sampling,  adjusted  simple  coefficients 
Uniform  sampling,  nonadjusted  simple  coefficients 
Asymptotic  expression  for  uniform  sampling,  optimal 
coefficients  or  uniform  sampling  adjusted  simple 
coefficients 
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